home *** CD-ROM | disk | FTP | other *** search
- WNILS Working Group Chris Weider
- INTERNET-DRAFT Merit Network, Inc.
- <draft-ietf-wnils-whois-02.txt> Jim Fullton
- CNIDR
- Simon Spero
- UNC-Chapel Hill
- October, 1993
-
-
- Architecture of the Whois++ Index Service
-
- Status of this memo:
-
- The authors describe an architecture for indexing in distributed databases,
- and apply this to the WHOIS++ protocol.
-
-
- This document is an Internet Draft. Internet Drafts are working
- documents of the Internet Engineering Task Force (IETF), its Areas,
- and its Working Groups. Note that other groups may also distribute
- working documents as Internet Drafts.
-
- Internet Drafts are draft documents valid for a maximum of six
- months. Internet Drafts may be updated, replaced, or obsoleted
- by other documents at any time. It is not appropriate to use
- Internet Drafts as reference material or to cite them other than
- as a "working draft" or "work in progress."
-
- Please check the I-D abstract listing contained in each Internet
- Draft directory to learn the current status of this or any
- other Internet Draft.
-
- This Internet Draft expires April 26, 1994.
-
- 1. Purpose:
-
- The WHOIS++ directory service [Deutsch, et al, 1992] is intended to provide
- a simple, extensible directory service predicated on a template-based
- information model and a flexible query language. This document describes
- a general architecture designed for indexing distributed databases, and then
- applys that architecture to link together many of these WHOIS++ servers
- into a distributed, searchable wide area directory service.
-
- 2. Scope:
-
- This document details a distributed, easily maintained architecture for
- providing a unified index to a large number of distributed WHOIS++
- servers. This architecture can be used with systems other than WHOIS++ to
- provide a distributed directory service which is also searchable.
-
- 3. Motivation and Introduction:
-
- It seems clear that with the vast amount of directory information potentially
- available on the Internet, it is simply not feasible to build a centralized
- directory to serve all this information. If we are to distribute the directory
- service, the easiest (although not necessarily the best) way of building the
- directory service is to build a hierarchy of directory information collection
- agents. In this architecture, a directory query is delivered to a certain agent
- in the tree, and then handed up or down, as appropriate, so that the query
- is delivered to the agent which holds the information which fills the query.
- This approach has been tried before, most notably in some implementations of
- the X.500 standard. However, there are number of major flaws with the approach
- as it has been taken. This new Index Service is designed to fix these flaws.
-
-
- WNILS Working Group Whois++ Index Service Weider, et al.
-
-
- 3.1. The search problem
-
- One of the primary assumptions made by recent implementations of distributed
- directory services is that every entry resides in some location in a hierarch-
- ical name space. While this arrangement is ideal for reading the entry once
- one knows its location, it is not as good when one is searching for the
- location in the namespace of those entries which meet some set of criteria.
- If the only criteria we know about a desired entry are items which do not
- appear in the namespace, we are forced to do a global query. Whenever we issue
- a global query (at the root of the namespace), or a query at the top of a
- given subtree in the namespace, that query is replicated to _all_ subtrees of
- the starting point. The replication of the query to all subtrees is not
- necessarily a problem; queries are cheap. However, every server to which the
- query has been replicated must process that query, even if it has no entries
- which match the specified criteria. This part of the global query processing
- is quite expensive. A poorly designed namespace or a thin namespace can cause
- the vast majority of queries to be replicated globally, but a very broad
- namespace can cause its own navigation problems. Because of these problems,
- search has been turned off at high levels of the X.500 namespace.
-
- 3.2. The location problem
-
- With global search turned off, one must know in advance how the name space is
- laid out so that one can guide a query to a proper location. Also, the layout
- of the namespace then becomes critical to a user's ability to find the
- desired information. Thus there are endless battles about how to lay out the
- name space to best serve a given set of users, and enormous headaches whenever
- it becomes apparent that the current namespace is unsuited to the current
- usages and must be changed (as recently happened in X.500). Also, assuming
- one does impose multiple hierarchies on the entries through use of the
- namespace, the mechanisms to maintain these multiple hierarchies in X.500 do
- not exist yet, and it is possible to move entries out from under their
- pointers. Also, there is as yet no agreement on how the X.500 namespace
- should look even for the White Pages types of information that is currently
- installed in the X.500 pilot project.
-
- 3.3. The Yellow Pages problem
-
- Current implementations of this hierarchical architecture have also been
- unsuited to solving the Yellow Pages problem; that is, the problem of
- easily and flexibly building special-purpose directories (say of molecular
- biologists) and of automatically maintaining these directories once they have
- been built. In particular, the attributes appropriate to the new directory
- must be built into the namespace because that is the only way to segregate
- related entries into a place where they can be found without a global
- search. Also, there is a classification problem; how does one adequately
- specify the proper categories so that people other than the creator of the
- directory can find the correct subtree? Additionally, there is the problem
- of actually finding the data to put into the subtree; if one must traverse
- the hierarchy to find the data, we have to look globally for the proper
- entries.
-
- 3.4. Solutions
-
- The problems examined in this section can be addressed by a combination of two
- new techniques: directory meshes and forward knowledge.
-
- WNILS Working Group Whois++ Index Service Weider, et al.
-
- 4. Directory meshes and forward knowledge
-
- We'll hold off for a moment on describing the actual architecture used in
- our solution to these problems and concentrate on a high level description of
- what solutions are provided by our conceptual approach. To begin with,
- although every entry in WHOIS++ does indeed have a unique identifier
- (resides in a specific location in the namespace) the navigational algorithms
- to reach a specific entry do not necessarily depend on the identifier the
- entry has been assigned. The Index Service gets around the namespace and
- hierarchy problems by creating a directory mesh on top of the entries.
- Each layer of the mesh has a set of 'forward knowledge' which indicates the
- contents of the various servers at the next lower layer of the mesh. Thus
- when a query is received by a server in a given layer of the mesh, it can
- prune the search tree and hand the query off to only those lower level servers
- which have indicated that they might be able to answer it. Thus search becomes
- feasible at all levels of the mesh. In the current version of this
- architecture, we have chosen a certain set of information to hand up the mesh
- as forward knowledge. This may or may not be exactly the set of information
- required to construct a truly searchable directory, but the protocol itself
- doesn't restrict the types of information which can be handed around.
-
- In addition, the protocols designed to maintain the forward knowledge will
- also work perfectly well to provide replication of servers for redundancy
- and robustness. In this case, the forward knowledge handed around by the
- protocols is the entire database of entries held by the replicated server.
-
- Another benefit provided by the mesh of index servers is that since the
- entry identification scheme has been decoupled from the navigation service,
- multiple hierarchies can be built and easily maintained on top of the
- existing data. Also, the user does not need to know in advance where in the
- mesh the entry is contained.
-
- Also, the Yellow Pages problem now becomes tractable, as the index servers
- can pick and choose between information proffered by a given server;
- because we have an architecture that allows for automatic polling of data,
- special purpose directories become easy to construct and to maintain.
-
-
- 5. Components of the Index Service:
-
- 5.1. WHOIS++ servers
-
- The whois++ service is described in [Deutsch, et al, 1992]. As that service
- specifies only the query language, the information model, and the server
- responses, whois++ services can be provided by a wide variety of databases
- and directory services. However, to participate in the Index Service, that
- underlying database must also be able to generate a 'centroid', or some other
- type of forward knowledge, for the data it serves.
-
- 5.2. Centroids as forward knowledge
-
- The centroid of a server is comprised of a list of the templates and
- attributes used by that server, and a word list for each attribute.
- The word list for a given attribute contains one occurrence of every
- word which appears at least once in that attribute in some record in that
- server's data, and nothing else.
-
-
- WNILS Working Group Whois++ Index Service Weider, et al.
-
- For example, if a whois++ server contains exactly three records, as follows:
-
- Record 1 Record 2
- Template: User Template: User
- First Name: John First Name: Joe
- Last Name: Smith Last Name: Smith
- Favourite Drink: Labatt Beer Favourite Drink: Molson Beer
-
- Record 3
- Template: Domain
- Domain Name: foo.edu
- Contact Name: Mike Foobar
-
- the centroid for this server would be
-
- Template: User
- First Name: Joe
- John
- Last Name: Smith
- Favourite Drink: Beer
- Labatt
- Molson
-
- Template: Domain
- Domain Name: foo.edu
- Contact Name: Mike
- Foobar
-
- It is this information which is handed up the tree to provide forward
- knowledge. As we mention above, this may not turn out to be the ideal
- solution for forward knowledge, and we suspect that there may be a number of
- different sets of forward knowledge used in the Index Service. However, the
- directory architecture is in a very real sense independent of what types of
- forward knowledge are handed around, and it is entirely possible to build a
- unified directory which uses many types of forward knowledge.
-
-
- 5.3. Index servers and Index server Architecture
-
- A whois++ index server collects and collates the centroids (or other forward
- knowledge) of either a number of whois++ servers or of a number of other index
- servers. An index server must be able to generate a centroid for the
- information it contains. In addition, an index server can index any other
- server it wishes, which allows one base level server (or index server) to
- participate in many hierarchies in the directory mesh.
-
- 5.3.1. Queries to index servers
-
- An index server will take a query in standard whois++ format, search its
- collections of centroids, determine which servers hold records which may fill
- that query, and then either a) forward the query to the appropriate servers
- on behalf of the user, (chaining in the X.500 terminology) or b) notify the
- user's client of the next servers to contact to submit the query (referral in
- the X.500 model).
-
- 5.3.2. Index server distribution model and centroid propogation
-
- The diagram on the next page illustrates how a mesh of index servers might be
- created for a set of whois++ servers. Although it looks like a hierarchy,
- the protocols allow (for example) server A to be indexed by both server
- D and by server H.
-
-
- WNILS Working Group Whois++ Index Service Weider, et al.
-
- whois++ index index
- servers servers servers
- for for
- whois++ lower-level
- servers index servers
- _______
- | |
- | A |__
- |_______| \ _______
- \----------| |
- _______ | D |__ ______
- | | /----------|_______| \ | |
- | B |__/ \----------| |
- |_______| | F |
- /----------|______|
- /
- _______ _______ /
- | | | |-
- | C |--------------| E |
- |_______| |_______|-
- \
- \
- _______ \ ______
- | | \----------| |
- | G |--------------------------------------| H |
- |_______| |______|
-
-
- Figure 1: Sample layout of the Index Service mesh
- _______________________________________________________________________________
-
-
- In the portion of the index tree shown above, whois++ servers A and B hand
- their centroids up to index server D, whois++ server C hands its centroid up to
- index server E, and index servers D and E hand their centroids up to index
- server F. Servers E and G also hand their centroids up to H.
-
- The number of levels of index servers, and the number of index servers at each
- level, will depend on the number of whois++ servers deployed, and the response
- time of individual layers of the server tree. These numbers will have to
- be determined in the field.
-
- 5.3.3. Centroid propogation and changes to centroids
-
- Centroid propogation is initiated by an authenticated POLL command (sec. 5.2).
- The format of the POLL command allows the poller to request the centroid of
- any or all templates and attributes held by the polled server. After the
- polled server has authenticated the poller, it determines which of the
- requested centroids the poller is allowed to request, and then issues a
- CENTROID-CHANGES report (sec. 5.3) to transmit the data. When the poller
- receives the CENTROID-CHANGES report, it can authenticate the pollee to
- determine whether to add the centroid changes to its data. Additionally, if
- a given pollee knows what pollers hold centroids from the pollee, it can
- signal to those pollers the fact that its centroid has changed by issuing
- a DATA-CHANGED command. The poller can then determine if and when to
- issue a new POLL request to get the updated information. The DATA-CHANGED
- command is included in this protocol to allow 'interactive' updating of
- critical information.
-
-
- WNILS Working Group Whois++ Index Service Weider, et al.
-
- 5.3.4. Centroid propogation and mesh traversal
-
- When an index server issues a POLL request, it may indicate to the polled
- server what relationship it has to the polled. This information can be
- used to help traverse the directory mesh. Only one field is specified in the
- current proposal to transmit the relationship information, although it is
- expected that richer relationship information will be shared in future
- revisions of this protocol. The field used for this information is the
- X-hierarchy field, and can take on three values. The first is 'topology',
- which indicates that the indexing server is at a higher level in the network
- topology (e.g. indexes the whole regional ISP). The second is 'geographical',
- which indicates that the polling server covers a geographical area subsuming
- the pollee. The third is 'administrative', which indicates that
- the indexing server covers an administrative domain subsuming the pollee.
-
- 5.3.5. Query handling and passing algorithms
-
- When an index server receives a query, it searches its collection of centroids,
- and determines which servers hold records which may fill that query. As
- whois++ becomes widely deployed, it is expected that some index servers
- may specialize in indexing certain whois++ templates or perhaps even
- certain fields within those templates. If an index server obtains a match
- with the query _for those template fields and attributes the server indexes_,
- it is to be considered a match for the purpose of forwarding the query.
- There are two methods of forwarding a query, called 'chaining' and 'referral'.
-
- 5.3.5.1. Query referral
-
- Query referral is the process of informing a client which servers to contact
- next to resolve a query. The syntax for notifying a client is outlined in
- section 5.5.
-
- 5.3.5.2. Query chaining
-
- Query chaining is done when the queried index server takes responsibility for
- resubmitting the query to the appropriate lower servers. The server
- will then forward the query using the syntax in section 5.4, but then takes
- no further responsibility for the query. A whois++ query can specify the
- 'trace' option, which causes each server which receives the query to
- send its IANA handle and an identification string to the client.
-
- 6. Syntax for operations of the Index Service:
-
- 6.1. Data changed syntax
-
- The data changed template look like this:
-
- DATA-CHANGED:
- Version-number: // version number of index service software, used to insure
- // compatibility
- Time-of-latest-centroid-change: // time stamp of latest centroid change, GMT
- Time-of-message-generation: // time when this message was generated, GMT
- Server-handle: // IANA unique identifier for this server
- Best-time-to-poll: // For heavily used servers, this will identify when
- // the server is likely to be lightly loaded
- // so that response to the poll will be speedy, GMT
- Authentication-type: // Type of authentication used by server, or NONE
- Authentication-data: // data for authentication
- END DATA-CHANGED // This line must be used to terminate the data changed
- // message
-
- WNILS Working Group Whois++ Index Service Weider, et al.
-
- 6.2. Polling syntax
-
- POLL:
- Version-number: // version number of poller's index software, used to
- // insure compatibility
- Type-of-poll: // type of forward data requested. CENTROID is the only
- // one currently defined
- Start-time: // give me all the centroid changes starting at this time, GMT
- End-time: // ending at this time, GMT
- Template: // a standard whois++ template name, or the keyword ALL, for a
- // full update.
- Field: // used to limit centroid update information to specific fields,
- // is either a specific field name, a list of field names,
- // or the keyword ALL
- Server-handle: // IANA unique identifier for the polling server.
- // this handle may optionally be cached by the polled
- // server to announce future changes
- X-Hierarchy: // This field indicates the relationship which the poller
- // bears to the pollee. Typical values might include
- // 'Topology', 'Geographical", or "Administrative"
- Authentication-type: // Type of authentication used by poller, or NONE
- Authentication-data: // Data for authentication
- END POLL // This line must by used to terminate the poll message
-
- 6.3. Centroid change report
-
- As the centroid change report contains nested multiply-occuring blocks,
- each multiply occurring block is surrounded *in this paper* by curly
- braces '{', '}'. These curly braces are NOT part of the syntax, they are
- for identification purposes only.
-
- The syntax of a Data: item is either a list of words, one word per line, with
- the syntax:
-
- word <tab> weight
-
- or the keyword:
-
- ANY
-
- . The weight is not required, but is expected to be used by weighting servers.
- The keyword ANY as the only item of a Data: list means that any value for
- this field should be treated as a hit by the indexing server.
-
- The field Any-field: needs more explanation than can be given in the body
- of the syntax description below. It can take two values, True or False. If
- the value is True, the pollee is indicating that there are fields in this
- template which are not being exported to the polling server, but wishes to
- treat as a hit. Thus, when the polling server gets a query which has a term
- requesting a field not in this list for this template, the polling server
- will treat that term as a 'hit'. If the value is False, the pollee is
- indicating that there are no other fields for this template which should be
- treated as a hit. This field is required because the basic model for the
- WHOIS++ query syntax requires that the results of each search term be 'and'ed
- together. This field allows polled servers to export data only for
- non-sensitive fields, yet still get referrals of queries which contain
- sensitive terms.
-
- CENTROID-CHANGES:
- Version-number: // version number of pollee's index software, used to
- // insure compatibility
- Start-time: // change list starting time, GMT
- End-time: // change list ending time, GMT
- Server-handle: // IANA unique identifier of the responding server
- Case-sensitive: // states whether data is case sensitive or case
- // insensitive. values are TRUE or FALSE
- Authentication-type: // Type of authentication used by pollee, or NONE
- Authentication-data: // Data for authentication
- Compression-type: // Type of compression used on the data, or NONE
- Size-of-compressed-data: // size of compressed data if compression is used
- Operation: // One of 3 keywords: ADD, DELETE, FULL
- // ADD - add these entries to the centroid for this server
- // DELETE - delete these entries from the centroid of this
- // server
- // FULL - the full centroid as of end-time follows
- { // The multiply occurring template block starts here
- BEGIN TEMPLATE
- Template: // a standard whois++ template name
- Any-field: // TRUE or FALSE. See beginning of 6.3 for explanation.
- { // the template contains multiple field blocks
- BEGIN FIELD
- Field: // a field name within that template
- Data: // Either the keyword ANY, or
- // the word list itself, one per line, cr/lf terminated
- // each word may be optionally followed by a <tab> and a weight.
- END FIELD
- } // the field ends with END FIELD
- END TEMPLATE
- } // the template block ends with END TEMPLATE
- END CENTROID-CHANGES // This line must be used to terminate the centroid
- // change report
-
- 6.4. Forwarded query
-
- FORWARDED-QUERY:
- Version-number: // version number of forwarder's index software, used to
- // insure compatibility
- Forwarded-From: // IANA unique identifier of the server forwarding query
- Forwarded-time: // time this query forwarded, GMT (used for debugging)
- Trace-option: // YES if query has 'trace' option listed, NO if not.
- // used at message reception time to generate trace
- // information
- Query-origination-address: // address of origin of query
- Body-of-Query: // The original query goes here
- Authentication-type: // Type of authentication used by queryer
- Authentication-data: // Data for authentication
- END FORWARDED-QUERY // This line must be used to terminate the body of the
- // query
-
- 6.5. Query referral
-
- SERVERS-TO-ASK:
- Version-number: // version number of index software, used to insure
- // compatibility
- Query-id: // some query identifier so the client knows which query to
- // issue to the following servers
- Body-of-Query: // the original query goes here
- Next-Servers: // A list of servers to ask next, either IP addresses or
- // hostnames, one per line, cr/lf terminated
- END SERVERS-TO-ASK
-
- 7. References
-
- Deutsch, et al. Architecture of the WHOIS++ service. August 1992.
- Available by anonymous FTP as
- ucdavis.edu://pub/archive/wnils/Architecture.Overview
-
-
- WNILS Working Group Whois++ Index Service Weider, et al.
-
- 8. Author's Addresses
-
- Chris Weider
- clw@merit.edu
- Industrial Technology Institute, Pod G
- 2901 Hubbard Rd,
- Ann Arbor, MI 48105
- O: (313) 747-2730
- F: (313) 747-3185
-
- Jim Fullton
- fullton@cnidr.org
- MCNC Center for Communications
- Post Office Box 12889
- 3021 Cornwallis Road
- Research Triangle Park
- North Carolina 27709-2889
- O: 919-248-1499
- F: 919-248-1405
-
- Simon Spero
- ses@sunsite.unc.edu
- 310 Wilson Library CB #3460
- University of North Carolina
- Chapel Hill, NC 27599-3460
- O: (919) 962-9107
- F: (919) 962-5604
-
-